Real time traffic sign recognition

ABSTRACT

Advanced driver assistance systems need to be able to recognize and to classify traffic signs under real time constraints, and under a wide variety of visual conditions. The invention shown employs binary masks extracted by color space segmentation, with a different binary mask generated for each sign shape. Temporal tracking is employed to add robustness to the detection system. The system is generic, and is trainable to the signs used in various countries.

CLAIM OF PRIORITY

This application claims priority under 35 U.S.C. 119(e)(1) to IndianProvisional Application No. 201641000153 filed Jan. 4, 2016

TECHNICAL FIELD OF THE INVENTION

The technical field of this invention is image processing.

BACKGROUND OF THE INVENTION

Traffic sign recognition (TSR) is a technology which makes vehiclescapable of recognizing the traffic signs appearing in the vicinity ofthe driving path. TSR systems form an important part of the ADAS(advanced driver assistance systems) that is currently being deployed inthe cars of today. It is a classic example of rigid object detection.TSR systems depend on forward facing image sensors. Current TSR systemsare aimed to assist the driver in the driving process. But in future,TSR systems will play a very crucial role in the functioning ofautonomous cars.

Computers face a lot of challenges in identifying traffic signs inimages due to the following reasons:

-   -   Within-class variability. The same traffic sign in the real        world can give rise to different images due to:        -   Different viewing positions and different distances between            the camera and traffic sign positions,        -   Photometric effects: positions of multiple different light            sources, their color, distribution of shadows, and view            obstruction by objects present near the traffic signs.        -   Between-class similarity: different classes of traffic sign            look may very much alike.        -   Background objects in cluttered urban environments also pose            a challenge,        -   Motion blur in images        -   Faded, bent, dirty, sign boards        -   Adverse weather conditions like rain and snow.            Traffic signs may also be slightly different from country to            country. For example, speed limit traffic signs in some            European countries are round with red circle boundary, while            in the US they are rectangular in shape.

SUMMARY OF THE INVENTION

A real time Traffic Sign Recognition (TSR) system is describedcomprising of a preprocessing stage to identify image regions containinga traffic sign, a localization stage accurately locate the sign withinthe image, a categorization stage to categorize the located sign intoone of the sign categories, and a temporal smoothening stage removenoise and false detections due to noise.

BRIEF DESCRIPTION OF THE DRAWINGS

These and other aspects of this invention are illustrated in thedrawings, in which:

FIG. 1 illustrates the s block diagram of the TSR system;

FIG. 2 illustrates the segmentation stage of the TST system;

FIG. 3 shows the flow chart of the ERST step;

FIG. 4 illustrates the computed feature planes;

FIG. 5 shows various positions of the models inside the image;

FIG. 6 illustrates a depth=2 decision tree;

FIG. 7 illustrates a block diagram of the temporal smoothening engine;and

FIG. 8 shows a flow chart of the temporal smoothening engine.

DETAILED DESCRIPTION OF PREFERRED EMBODIMENTS

A four stage TSR algorithm is shown as described below. It is also shownpictorially in FIG. 1.

Stage 1: Preprocessing Stage 101

Identify the approximate image regions having traffic signs withoutmissing any traffic sign in input images.

Stage 2: Accurate Localization Stage 102

Stage 2 a: Extract features from input images in 103

Stage 2 b: Accurate localization of the region of traffic sign withinthe image using classifier in 104.

Stage 3: Classification Stage 105

The windows localized by stage 2 are categorized into one of thecategories.

Stage 4: Temporal Smoothening 106

This stage is meant to remove the noisy detections and noisyclassification that have been obtained from stage 3.

The preprocessing stage works on the input image and is aimed atreducing the complexity of TSR system by reducing the amount of datathat is processed by subsequent stages. It is implemented in two steps:

Extract color cues to find possible locations of traffic signs

A shape detector uses these color cues to identify image locationshaving traffic signs.

As shown in FIG. 2, the RGB input image to the preprocessing stage isspatially subsampled by a factor of four in both directions in 202, andconverted to YUV in 203. This is to keep the complexity of segmentationstage under manageable limits for an embedded system implementation.

Contrast stretching is done in 204 by using histogram equalization onthe Y plane. This improves the performance of the algorithm in many lowcontrast input images.

Red, Blue, Yellow and White binary masks are extracted by thresholdingin YUV color space (1 mask for each color) in 205.

Morphological opening (erosion followed by dilation) is applied in 206for each of these binary masks.

The masks are combined in 207

The binary masks are used by extended radial symmetry transform (ERST)in 208. ERST detects circle, triangle, square and octagon in the inputimages by performing voting for the gradients present in regions ofmask.

FIG. 3 shows the flow chart of the ERST.

In 301 a gradient map for entire image in grey scale is computed usingSobel operator.

In 302, the binary masks obtained from color space thresholding act as(color) cues for this stage.

The gradients that are less than threshold are zeroed out in 303 and arenot considered for later stages.

The voting is performed in a 3D accumulator array(x,y,r) 304. One 3Daccumulator array is maintained for each shape (circle, square,triangle, and octagon).

Voting (incrementing procedure of accumulator cells) is performed onlyfor the gradient (edge) points for which the binary value in the mask isnon-zero.

After voting finishes for the entire image in 305, the top ‘N’ peaks ineach accumulator are used in 306 to determine the position and radius ofthe circle/polygon at that point.

Feature extraction Stage 2 a is performed by:

-   -   For each input image, an image pyramid is prepared. The number        of image scales used is dependent on:    -   Maximum and minimum traffic sign size to be detected    -   Input image dimensions    -   Complexity, accuracy tradeoff considerations    -   Aggregate Channel Features (ACF) planes are computed for every        scale of each image, as shown in FIG. 4.    -   ACF is a collection of 10 feature planes of 3 channels        comprising of original pixels of YUV space, 1 Gradient magnitude        channel and 6 orientations channels of “histogram of oriented        gradients (HOGs)”.    -   Each of HOG orientation channel used as a part of ACF is        computed from a cell size of 4×4 pixels without any overlap        between cells and without block normalization.

Traffic sign localization Stage 2 b is performed by:

An ADA boost (Adaptive Boosting) classifier is used for thislocalization. Boosting is an approach to machine learning based on theidea of creating a highly accurate prediction rule by combining manyrelatively weak and inaccurate rules.

1024 number of decision trees of depth 2 act as weak classifiers for ADAboost. A single weak classifier is depicted in FIG. 6.

Features computed from 32×32 pixel blocks of images (known as a model)are used as inputs to the classifier. The model is made to step by 4pixels (both horizontal and vertical) on each image and each scale, asshown in FIG. 5. At each position of the model 501 a feature vector ofsize 640 pixels is computed using the feature planes.

Feature vectors obtained in this manner from training images are usedfor training the ADA boost classifier. Training is done in 4 stages with32, 128, 256, 1024 weak classifiers used in each stage. Boot strappingis used in each stage to strengthen the hypothesis.

The feature vector of size 640 pixels is fed to the ADA boostclassifier. The ADA boost returns a real number which is binarythresholded to decide if TS is present or not. Note that localizationprocedure is only a binary decision procedure where it is decided if atraffic sign is present or not. Actual classification (categorization tospecific class) is done in the next stage.

Traffic sign classification Stage 3 is done by:

-   -   The windows that are marked as containing traffic signs in Stage        2 b are passed to next stage for categorization. Stage 2 b is        designed with maximum sensitivity in mind, i.e. no valid traffic        sign should be missed but few false positives are acceptable.        These false positives are filtered out by Stage 3.    -   The feature vector used in Stage 3 is of size 2992 pixels and is        used as an input to the Linear Discriminant Analysis (LDA)        classifier.    -   LDA relies on minimization of Mahalanobis distance between a        feature vector and the mean vector of various classes. The        Mahalanobis distance of an observation

x=(x ₁ , x ₂ , x ₃ , . . . , x _(N))^(T)

-   -   from a group of observations with mean

μ=(μ₁, μ₂, μ₃, . . . , μ_(N))^(T)

-   -   and covariance matrix S is defined as:

Minimization of Mahalanobis distance is mathematically equivalent tominimization of the below function

g _(i)(x)=w _(i) ^(t) x+w _(i0)

where g_(i)(x)→cost function for class ‘i’

-   w_(i)→weight vector for class ‘i’-   w_(i0)→bias for class ‘i’-   x is vector of size 2992 pixels.-   w_(i) and w_(i0) are pre-computed (during training) and are    different for different classes. For a given feature vector x,    g_(i)(x) is computed for each class and the feature vector is    associated with the class that gives the minimum value of the    function g(x).

Temporal smoothening Stage 4 is performed by:

Removing the noisy detections and noisy classification that have beenobtained from the earlier stages. This stage is present only when theinput is a sequence of images that form a part of single video.

The temporal smoothening engine is conceptually depicted in FIG. 7. Theinputs to temporal smoothening engine are:

The descriptors of detection windows 701 (position and dimensions)obtained from stage 2.

Class id's 702 that are associated with each of these detection windowsobtained from stage 3.

The temporal smoothening engine internally maintains a history of thedetection windows. This history is empty at the start of the sequence ofpictures and is updated after every picture. The decision logic blockinside the engine looks at the inputs and the history before finalizingthe windows and its associated class.

It uses the Jaccard coefficient to measure degree of similarity betweenwindows detected in the current picture and the windows stored in thehistory. Jaccard coefficient J(A, B), between two windows A and B isdefined as follows,

${J( {A,B} )} = {\frac{{A\bigcap B}}{{A\bigcup B}}.}$

The numerator term denotes the area under intersection and denominatordenotes the area in the union of the two windows.

The details of the temporal smoothening engine are shown in FIG. 8.Inputs to the temporal smoothing engine are the detection windows(det_win []) 801 output from stage 2 of the

TSR algorithm, and the class id (id[]) 802 for each detection window. In803, hist[] is the state memory that is built when a new picture isprocessed. The Jaccard coefficient is computed in 804 for every pair ofwindows, with one window selected from hist[] and the second fromdet_win[]. In 805 det_idx is set to zero, and in 806 the we find thehist[best_match_hist_idx] that gives the J, J_max when paired withdet_win[det_idx]. If in 807 J_max is >0.5, hist[best_match_hist_idx isstored into det_win[det_idx], and id[det_idx]is associated with the sameentry of hist[] in 808. If J_max is = or < than 0.5 in 807,det_win[best_match_det_idx] is added to hist[] as a new entry, andid[best_match_det_idx] is stored with the same entry of hist[] in 809.In 810 we determine if all entries of det_win[] have been processed. Ifnot, det_idx is incremented in 811, and the flow returns to 806. If allentries have been processed, all hist[] entries that have not beenupdated are deleted in 812.

The output of temporal smoothening engine in 813 and 814 is used as thefinal output of the TSR system.

What is claimed is:
 1. A method of traffic signal recognition comprisingthe steps of: identifying the approximate image regions containing atraffic sign; extracting features from the input images; localizing theregion in the image containing the traffic sign; classifying saidtraffic sign into one of the sign categories; applying temporalsmoothening to the image.
 2. The method of claim 1 wherein: color cluesare extracted from said approximate image locations to find possiblelocations of traffic signs; image locations containing traffic signs areidentified by a shape detector.
 3. The method of claim 1 wherein: theinput image is spatially subsampled in both directions, and contrast isenhanced by histogram equalization.
 4. The method of claim 1 wherein:red, blue, yellow and white binary masks are extracted by thresholdingin YUV color space, followed by morphological opening (erosion followedby dilation) of each of said binary masks.
 5. The method of claim 1wherein: said binary masks are combined and sign shapes are detected byusing an extended radial symmetry transform followed by an adaptiveboost classifier.
 6. The method of claim 1 wherein: detected sign shapesare further categorized by using a Linear Discriminant Analyzer.